Projects

Blog Posts

Iceberg Series, Part 6: Multi-Engine & Maintenance

Querying Iceberg from Trino, Flink, and DuckDB; expiring snapshots; rewriting data files; and keeping Iceberg tables healthy in production.

Iceberg Series, Part 5: Row-Level Operations

How MERGE, UPDATE, and DELETE work in Iceberg — copy-on-write vs merge-on-read, when to use each, and the performance trade-offs.

Iceberg Series, Part 4: Hidden Partitioning & Evolution

Partition transforms that derive partition values automatically, partition evolution that changes strategy without rewriting data, and why these are Iceberg's biggest ergonomic wins.

Iceberg Series, Part 3: Catalogs

How Hive, Glue, REST, and Nessie catalogs coordinate multi-engine access to Iceberg tables — and why the catalog abstraction is Iceberg's biggest differentiator.

Iceberg Series, Part 2: Table Format Internals

The four-layer metadata hierarchy — table metadata, manifest lists, manifest files, and data files — and how it enables efficient scans and snapshot isolation.

Iceberg Series, Part 1: Getting Started

Creating Iceberg tables with Spark, reads, writes, MERGE, time travel, and inspecting table history.

Iceberg Series, Part 0: Overview

What is Apache Iceberg, how does it differ from Delta Lake and Hudi, and why multi-engine interoperability is its defining advantage.